{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 实例3：从Excel薪资总表中生成各部门的透视表"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "公司人事部的妹子经常需要给各部门经理提供其部门人员的情况，比如薪资、背景等。这些信息都是保密的，只能让部门经理知道，因此是不可能将总表发给他们，让他们自己筛选查看的。通常的做法就是在Excel中建个数据透视表，然后按不同经理或部门逐个筛选，复制粘贴到新的Excel文件中，分别发给各部门经理。这个纯体力活，让程序去干就好了。\n",
    "\n",
    "下面我们就来将公司的薪资按部门汇总，并自动生成各部门的Excel文件。代码总共只需要5行，非常简单高效。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>工号</th>\n",
       "      <th>姓名</th>\n",
       "      <th>部门</th>\n",
       "      <th>薪资</th>\n",
       "      <th>经理</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>701001</td>\n",
       "      <td>王波</td>\n",
       "      <td>工程部</td>\n",
       "      <td>8900</td>\n",
       "      <td>李飞</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>701002</td>\n",
       "      <td>刘海洋</td>\n",
       "      <td>生产部</td>\n",
       "      <td>6000</td>\n",
       "      <td>刘晓丽</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>701003</td>\n",
       "      <td>少和光</td>\n",
       "      <td>品质部</td>\n",
       "      <td>6000</td>\n",
       "      <td>张春香</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>701004</td>\n",
       "      <td>真凡巧</td>\n",
       "      <td>仓务部</td>\n",
       "      <td>4700</td>\n",
       "      <td>李国际</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>701005</td>\n",
       "      <td>鲁鸿运</td>\n",
       "      <td>船务部</td>\n",
       "      <td>5500</td>\n",
       "      <td>王明</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       工号   姓名   部门    薪资   经理\n",
       "0  701001   王波  工程部  8900   李飞\n",
       "1  701002  刘海洋  生产部  6000  刘晓丽\n",
       "2  701003  少和光  品质部  6000  张春香\n",
       "3  701004  真凡巧  仓务部  4700  李国际\n",
       "4  701005  鲁鸿运  船务部  5500   王明"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "df = pd.read_excel(\"data\\salary_info.xlsx\")\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "首先导入`pandas`库。Pandas是一个用于数据分析的功能非常强大的python包，是数据分析的必学必备工具。这些了解一下即可，总之它们就是一个工具，能为我所用，能解决问题行，无需深入了解，只需要了解我们要使用的部分就行了。\n",
    "\n",
    "为方便后续代码的书写，惯常做法是给导入的库起个“绰号”，比如`import pandas as pd`的意思是导入pandas库，并取绰号为\"pd\"，这样在后续的程序中直接用pd就可以代表pandas。\n",
    "\n",
    "然后我们使用`pd.read_excel`读取Excel文件，可直接将Excel文件读取为一个数据框(DataFrame)。可以理解数据框就是一张Excel表，请看上图，是不是很像我们常用的Excel表格呢？我们用`df.head()`看一下它的前5行，以便观察是否有正确读取。看起来是一切正常的。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下面我们就按不同的经理来将这些数据拆分成不同的表格并单独保存，以便通过邮件的方式发给对应的经理。当然也可以按部门来拆分，原理是类似的。总表及拆分后的表如下图所示。可见拆分操作在不到1秒之内就全部完成了，堪称神速....\n",
    "![](images\\excel_cutting.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "完成！\n"
     ]
    }
   ],
   "source": [
    "#拆分Excel表，并单独保存\n",
    "managers=df[\"经理\"].unique()\n",
    "for manager in managers:\n",
    "    df[df[\"经理\"]==manager].to_excel(\"data\\salary_info_{}.xlsx\".format(manager),index=False)\n",
    "print(\"完成！\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "因为在总表中，经理的名下有很多员工，因此经理的名字是重复的，我们只需要获得所有经理的名字一次就够了，`unique()`就可以实现。`df[\"经理\"].unique()`可以理解为，从工作表中选取列名为“经理”的列，然后每个经理只取一次名字，然后存入manager这个数组。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['李飞', '刘晓丽', '张春香', '李国际', '王明', '朱柳峰'], dtype=object)"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "managers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "然后我们将经理的名字逐个传入数据表，并逐个保存成新的Excel文件。`df.to_excel()`可实现将数据框存储为Excel文件。括号内的参数为需要存入的路径和文件名，此处我们希望把经理的名字放在文件名里面以便区分，所以使用格式化字符串函数`.format`将经理的名字传入文件名。我们不想在Excel中显示数据框的索引，因此`index`设为`False`。分拆后的Excel表如下图所示，分拆成功。\n",
    "![](images\\result.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对于格式化字符串函数`.format`，可通过如下例子理解。即我们在前面字符串中放一个\"{}\"，然后`format()`括号内的“”"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "你好，世界！\n"
     ]
    }
   ],
   "source": [
    "print(\"你好，{}！\".format(\"世界\"))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
